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ABSTRACT 

Recent reports by the National Council of Teachers of 
Mathematics (NCTM), the National Research Council, the Mathematical 
Sciences Education Board, and the Board on Mathematical Sciences 
emphasize that coordinated improvement in the content of mathematics 
curriculum, classroom instruction, and teacher education is the key 
to improving mathematics education. Consequentially, methods of 
assessing students' mathematics achievement also need to be 
reevaluated. This bulletin focuses on the impact that assessment has 
and can have on mathematics curriculum and instruction. The bulletin 
is divided into two sections. The first section addresses assessment 
done by the teacher as part of classroom instruction. Topics 
discussed include: teacher competencies in assessment, assessment 
techniques as recommended by the NCTM, reform of large-scale 
mathematics testing, and the need for equal opportunities for all 
students to learn mathematics. The second section discusses the 
impact that certain uses of large-scale testing can have on 
curriculum revision efforts. Topics discussed include: the perception 
of testing as a gatekeeper to curriculum reform and the revision of 
standardized tests to reflect current curricular reforms. A list of 
36 references is included. (MDH) 
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}3y the mid 1980s, mathematics education in the United 
States was under intense pressure to make major changes atall 
levels. Many students were dropping out of mathematics at 
their earliest opportunities, and others were being held for an 
inordinate amount of time in routine arithmetic skill courses, 
African-Amencans and Ilispanics were most likely to be in 
these catcgones. rndergradiiate students majonng in math- 
ematics-related fields were decreasing in number, and increas- 
ing numbers of students, especially at the graduate level, were 
not r.S. citj7.ens. Business and industry were decrying the 
inability of new employees to deal with nialhematiail i)rob- 
Icms or mathcmaticiil content Ix^yond the most basic le\ el. 
'Iliis problem was corroborated by the National Assessment of 
nducational I^ogress, which showed that a nationally repre- 
sentative sample of students could do basic computation but 
was unable to apply mathematics in real world situations. 
School and um\'crsity mathemaQcs were being cntici/x;d on 
other fronts for being outdated and for failing to prepare 
students to deal with the sophisQcatcd technology they would 
encounter in the workplace, i'inally, a barrage of criticism and 
calls for change followed the aimouncement that American 
students in grades eight and eleven scored well below die 
median in the Second InlemaUonai Mathematics Study. Iliis 
|H:rfonmimce was much lower than that of students in highly 
industnalized Japan and Western Huropean nations. 

In response to these pressures, the NaQonal Coimal of 
Teachers of Matliemaacs organi7xd its Standards IVoject in 
1 986, which has now resulted in the publicaaon of two sets of 
recommendations that together provide a coordinated vision of 
the sch(X)l madiematics curriculum, the teaching of mathemat- 
ics, and the assessment of mathemaucai abihties (NCTM, 
1989; 1991). Furthermore, the mathematics and mathematics 
education communities through publications of professional 
organizations have now aruculated their collective profes- 
sional judgment of what exemplary pmctice and recent re- 
search (in leaclung and leanung as well as in mathemaUcs 
itsein suggest for the maiheniaUcs curriculum. Closely coor- 
dinated with the NXri'M sUmdards me recommendations for 
reforming the undergraduate madiematics cumculuin, pro- 
duced joinUy by the National Research Council (1991), the 
Mathemaucai Sciences Iklucation Board, tlie Conunittee on 
the Mathematical Sciences in the Year 2000, and the Board on 
Matliematical Sciences. Recommendations for the math- 
cmaticai content for prospective teachers have also been made 
(Mathemaucai Association of America. 1991). 



These reports all emphasise that coordinated, cross -level 
improvement in the content of the curriculum, in classroom 
instruction, and in teacher education is the key to improving 
mathematics education. In addition to cumculum, instruction, 
and teacher education, methods of assessing the mathematical 
performance of students must be reevaluated i n light of the new 
curricular goals. The abo\ e sets of recommendations are in 
agreement on diis last point, tcx), but only the two volumes 
from the NCTM address assessment in a substantive wav 
(NCTM, 1989, 1991). 

The term assessment is meant to have a broader meaning 
than testing in tradiuonal ways, most notably the ubiquitous 
paper-and- pencil objective lest. /Assessment is not necessanly 
restncted to formal evaluations of student achievement, in- 
struction, or program. This paper focuses on assessment as it 
impacts, or could impact, mathematics curriculum and instruc- 
tion. It IS divided into two main sections: 'Ilic first addresses 
assessment done by the teacher as part of classroom instruc- 
Qon. The second secuon discusses the impact that certain uses 
of large-scale tesQng, prepared by measurement spcciahsts 
outside the classroom, can have on cumculum revision efforts. 

Assessment in Support oflnstruction 

Teachers face daily unique problematic situations in their 
classrooms. At any u me, a student may illustrate confusion, yet 
a parUal understanding of a concept, in a way that is entirely 
new to the teacher. Because these situaUons fall outside any 
existing lheor\*, the teacher will ha\ e no source of direcUy 
applicable rules available. In order to respond competendy, 
the teacher must improvise and lest strategies of his or her own 
design in the situauon. In short, the teacher must be arellecuve 
practitioner, applying in ad hoc ways all expenence, pedagogi- 
cal knowledge of content, knowledge of students, and the 
interaction of these components (Schon, 1990). A rellective, 
liighly a^mpctent teacher will be almost conunuously assess- 
ing students* imderstanding, their motivation to work on 
certain tasks , their readiness to proceed to new activities , dieir 
ability to work together, the effecrivenessof a lesson plan as he 
or she implements it. and so on. Assessment, in this broad 
sense, is a ver\" important part of instmction. 

Professionals other than teachers, such as medical doctors 
lawyers, and clinical psychologists, generally conduct assess 
ments ar*d make diagnoses diat are reqiured in their work 
These professionals may employ technicians and testing m 
stnnnents when they judge i t lo be appropnate, but tliey reserve 
the right to analyze the assessment results and prescribe a 
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strategy or treatment that combines their best understanding of 
all pertinent aspects of a situation. The ability to conduct 
assessments in their domain of expertise and to translate the 
results into directions for practice is a major part of what 
characterizes a professional in that domain. 

The teacher, as the professional in the classroom^ is in the 
best position to imderstandall pertinent variables that may help 
solve local problems of instruction and learning. No one else 
has experience in this classroom, understanding of how these 
students learn and what motivates them, or imderstanding of 
how to teach the subject matter to these students, what is likely 
to be difficult for them, and what content came before and what 
comes next in the curriculum. For the most part, however, 
teachers have not been well educated in assessment techniques 
and strategies. The education of teachers is usually far more 
prescriptive than the education of other professionals, focusing 
on rules about how to handle classroom dismptions and very 
specific teaching activities that can be inserted directly into 
lesson plans. Strategies for assessing students' understanding 
of mathemaUcs (or any other subject matter) are usually given 
little attention beyond a brief unit oa interpreting results of 
norm-referenced tests and some experience writing and scor- 
ing tests during student teaching (NCTM, 1991; Wolf, Bixby, 
Glenn, & Gardner, 1991). As a result, teachers are much better 
at managing a class than they are at individual or group 
assessment, diagnosis, and prescription. Assessing, to most of 
them, means giving paper-and-pendl tests, either large-scale 
tests developed by measurement specialists or their own tests 
for purposes of grading. 

Recognizing the need for teachers to become more skilled at 
assessment, a joint committee of the American Federation of 
Teachers, the National Coimcil of Measurement in Education, 
and the National Education Association developed the follow- 
ing seven standards for teacher competence in educational 
assessment {AFT, NCME & NEA, 1990). Teachers should be 
skilled in: 

1. choosing assessment methods appropriate for 
instructional decisions; 

2. developing assessment methods appropriate for 
instructional decisions; 

3. administering, scoring, and interpreting the results of 
both externally produced and teacher-produced 
assessment methods; 

4. using assessment results when making decisions about 
individual students, planning teaching, developing 
curriculum, and school improvement; 

5. developing valid pupil grading procedures that use pupil 
assessments; 

6. communicating assessment results to students, parents, 
other lay audiences, and other educators; and 

7 . recognizing unethical , illegal , and otherwise inappropriate 
assessment methods and uses of assessment information. 

llic above standards are certainly important for all teachers. 
These standards, especially the first foiu*, assume, however, 
that teachers possess what Shulman ( 1986) called pedagogical 
content knowledge. As a prerequisite to the above ccmpeten- 
cies , teachers need a view of what is important for students to 
understand and be able to do in the subject matter they are 
teaching, which student behavior indicates an understanding 



of particular concepts in the subject matter, difficulties that 
their students are likely to encounter in trying to Icam particu- 
lar tq)ics, how ideas m the curriculum are connected to one 
another, and multiple ways to represent concepts that will 
better focus the assessment on deficiencies in an individual 
student's understanding. 

In fact, Shulman (1991) argued that the study of effective 
teaching in a generic sense has limited potential. Rather, 
teaching can only be unders tood by s tudying the knowledge of 
pedagogy that is required to effectively teach specific content 
to specific students in specific contexts. The same sort of 
iimitation (viewing assessment as content-free and context- 
free) seems to apply to a generic set of assessment competen- 
cies like the AFT-NCME-NEA standards given earlier. 

As an illustration for a mathematics teacher, consider a 
jimior high pre-algebra student who has no difficulty with 
equations the following: 

X + 3 =10 

2x - 8 = 15 

The student has persistent difficulty, however, when the sides 
of the equations are switched despite the teacher* s best efforts 
to point out the error. Frustrated teachers in such situations 
often respond by blaming the student' s carelessness or earlier 
teachers * i ncompetence or lack of thoroughness . Such impro- 
ductive responses are less likely if the teacher knows that this 
is a very common error pattern due largely to the different uses 
of the equal sign in arithmetic and in algebra. In arithmetic, the 
eqizal sign is almost always a signal to do something to 
whatever i s on the 1 eft side of the equation to get the answer that 
is then placed on the right side. In algebra, however, the same 
symbol means the numerical value of the left side is the same 
as that of the right side. Six or eight years of a student's 
successfully viewing a concept hke equal in one way cannot be 
imdone easily or quickly. Herscovics and Kieran (1980) have 
designed and validated an instructional sequence that begins 
with the arithmetic meaning of equahty and leads the student 
to expand that meaning to the algebraic concept. 

Researchers have analyzed how students at different levels 
leara other concepts and topics in mathematics and have 
identified the kinds of errors and misconceptions diat are 
likely. For example, a great deal of research has focused on 
elementary and middle school topics like fractions, decimals, 
and story problems, which have different mathematical struc- 
tures (e.g., Hiebert c&Behr, 1988). Algebraic or pre-algebraic 
topics (like equality in the earher example), variables, and 
graphs have also been studied in detail from this perspective 
(cf. Wagner & Kieran, 1989; Leinhardt, Zaslavsky, & Stein, 
1990). Instructional sequences that help students ieam these 
concepts and avert or correct misconceptions have also be 
developed and validated for many concepts (cf . Traf ton, 1989; 
Coxford, 1988). Teachers who are not well versed in these 
pedagogical aspects of mathematics will be limited in their 
abih ty to assess a s tudent* s performance and to make appropri- 
ate adjustments in instruction. 

For the vision of the mathematics curriculum common to 
these sets of recommendations to become a reality, textbooks 
and other instructional material that support these curricular 
goals will need to be developed. Assessment techniques and 
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-isscssmcm iiLSiruments should also support the cumculum 
.md instrucuon. Ilius, ail assessment, whether developed by 
ihc teacher or by persons or aijeiicies outside tiie classroom^ 
^ iiouid be :dii!iied uiih the content topics ol" the ciimcidum and 
siioiild reinibrce ihc view of wliat is innx)riant in the curricu- 
lum and in classroom mstnicuon. 

Assessment Techniques 

i ne XCn^f recommends iluii teachers should be able to do 
:he ibllowing: 

1 . use a variety otassessment methods to detcmune students' 
understanding of madiemaacs; 

2. match assessment methods with the developmental level, 
the maUieinaucal maiunty, and the cultural backgroimd 
of the siudciii; 

}. align assessment mefiiods wiUi what is tauyht and how it 
is tauglit; 

4 imaly/c individual suidents" \mderstandinj^ oi', and 
Liisposiuon 10 do. nuidiematics so that mibnnauoii alK)ut 
iheir niaLhemaucai devckn)Hicnt can be provided to the 
sludeiiis, ihuir i).'ireiits. aiiu iKTunent sciux^l personnel. 
:ind 

5 base insinicuon on inl'onuauon obtained Iroin a.ssL-ssiii;^ 
students' undersuuidiiiu oi\ .md disix)siUoii lo do. 
madieniaucs (NCTM. 19^M ; p 1 10) 

fo be able to do all ol' tliis. teachers must surely ha\ e a solid 
pedaizogiail knowledge of malheinaucs. as descnbed earlier m 
this pajKr. at least at the level that Lhey are teaclunir.tind a clear 
understanding oi how students leani. Like teacliing, assessing 
as a part of iiistnicuon caiuiot meaningi ully separated from 
;he teacher* s knou ledge ol die subject matter and understand- 
aiiz of how students leajn that sui))ect matter, hi addition to the 
•eaclier's knowledge of die subject matter and ol die students. 

he teacher* s repertoire should iiiciude a vjuiety ol' assessment 
■.cciimques and a plan that ues assessment to iis insirucuonal 
ruqx^se. .V iiunib<:r of insinietioual piuposes diat nught dnvc 

iie choice ni an assessment method are uiven in Table 1 {p.4). 

\ tucii is a bnel version of a table fcumd in NCJl'M ( 1989; pp. 
:''K)-2()1). 

i-ew of ilic assessment meUiods in Table I are new. They 
ictlect and rceinphasi/e. however, tlie nch vjuiety of student 

■utcoines that should be intended m a mathemaacs class. High 

cores on a siaiidardi/ed inuiuple- choice t)r short-answer lest 
.ire a desirable outcome but b\' no means the onl\' one. Since 
•he assessment method gives a strong message to the student 
.iix)ut what IS important (Lester ^ KjoH. 1990), all desirable 
iiLstriiction;d outcomes should be assessed. vStated dilTerently, 
;i an instructional outcome is not assessed, students are likely 
:o consider it lo be ununportant and not worth llieir time and 

.iieniion. ucspiie the teacher's emphasis or exhortations to the 

outrary 

J NCTM recoimnendauons such as solving applied prob- 
.cms, reasoning maihcmaucally, eommurucaang maihemati- 
-;d ul eas usum i(K)is such as calculators, and working with 
• idler students are desired instrucuonal outcomes, ihensoine of 
:!ie assessment that 'counts" should incltide observation and 
iiudy sis of s tudents doing inaUiematics imder diese condi dons . 
: orexiunplc. students could Ixr eiven die following problem to 



work on in a small group (California Mathematics Council. 
1989; p. 21). 

We have reached into tliis bag of blocks 6 times and have 
pulled out 3 red blocks. 1 green block, and 2 blue blocks. 
W you reached into die bag and pulled out another block, 
what color do you think it would be? Explain why you 
think it would be that color. How could you get more 
inl'ormauon? 

Questions conccrmng die students' problcm-solvmg pcrlbr- 
raance that the teacher as observer might want to use as a focus 
;ire the following: 

l)o students have a systematic way of organizing and 
recording ini'ormauon? 

Do they relate a probK-ni lo odier similar problems? 

Are lhey able to express dieir ideas orally? 

.Vre they able lo come up with ideas for getung more 
inlbmiauon? 

On the other hajid. if die teacher wanted to assess the students' 
disposiuon toward doing madiemalies. the focus might be: 

Do siudeuLs plan iKfore acting and revise dieir plan as 
necessary? 

I3o they stick to die task without being easily distracted? 

Do lhey use supplementary tools such as calculators as 
needed? 

I'X) they siii)port dieir arguments widi evidence? 
Do they complete the task? 

Do they review dieir solution process and dieir result? 

Sinularly. if die teacher was interested in how well students 
worked as a group, lie or she would focus on group ...teracdon 
and commimicauon behaviors such as the following: 

Do students engage in (hscussions in order to clarify and 
conimumcate their ideas to others? 

Do Lhey descnbc their problem-solving processes clearly 
enough so dial they are replicable? 

Dodieyhave the conlidence to make a report lo diewhole 

class? 

|])o they capabU and fairly represent a group consensus? 

Do diey syndiesize and siiniman/x individual and group 
thinking? 

Madieinatics educators are also Ix^gimung to borrow the 
idea of porti'olioassessment from art and wndng teachers. One 
t\pe oi |X)rtJblio dial shows a student's development would 
contain a Scunpling of a saident' s earlier and more recent work 
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Table 1. Purposes and Methods of Assessment (NCTM, 1989; pp.200- 201) 



Purposes (examples of questions asked) Assessment Methods 



Diagnostic 

Purposes (examples of questions asked): 

• What docs this student understand about the concept or procedure? 

• What asf>ccts of problem solving are causing difficulty? 

• What accounts for this student's imwillingness to attempt new problems or to see the application of previousiy learned 
materials? 

Assessment Methods: 

• Oral questions that ask students to explain their procedure 

• Focused written tasks 

• Directed test items 

• Observation 

Instructional Feedback 

Purposes: 

• What do students know about the material presented? 

• Can students apply their learning lo new situations? 

• Do students imder stand the conn ecu ons among ideas? 

• How shall I pace instrucdon? 

• Does die class need more intensive review or more challenging material? 
Assessment Methods: 

• Written tests » including those that require differ endal methods for solutions to problems 

• Qass presentations 

• Extended problem-solving projects 

• ObservaUon of class 

• Take-home tests 

• Homework, journals 

• Group work and projects 

Grading 

Purposes: 

• How well has this student imderstood and integrated the material? 

• Can this student apply his or her learning in oUier contexts? 

• How prepared is this smdent to proceed to die next grade or le' el? 
Assessment Methods: 

■ Extended problem solving projects 

• Papers or written arguments that demand thoughti'ul inquiry about a mathemaucal topic 

» Written tCijts that present problems with a range of difficulty based on expectations for the course 

• Oral presentations 

Generalized mathematical achievement 



• How docs the general madiematical ability of diis student compare with others or widi a national norm? 
Assessment Metfwds: 

• Stxindardized achievement tests 

Program evaluation 

Purposes: 

• How effective is this instructional program in achieving our goals for mathematical learning? 
Assessment Methods: 

• Student interviews 

• Peribrmance tests 

• Criterion -referenced tests 

• Observauon of class discussions 

• Success of students who have completed llie program 



Purposes: 
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that illustrates various kinds of niathcmaticaJ pcrlbrmanccs 
(Wolf, ct ai.. 1991). Such a portlolio in maihcmaUcs might 
include a student' s wnttcn dcscnpuons of the a^sults of prac- 
ucal or purely mathemaiical projects, extended analysis of 
problem situations and invesagatioas. descriptions and dia- 
grams of problem-solving processes, staastical studies and 
graphic representations of data, responses to open-ended ques- 
tions or homework problems, copies of awards or pn/.es. 
video, audio . or computer-generated e xamplcs of student work . 
a mathematical biography updated annually, and excerpts 
from the student's mathematical journal (California Math- 
ematics Council. 1989). Thcresultof keepingportlbliosis that 
teachers, saidenis. and parents have access to a continuous 
bo<iyof work that is an indication of the student' s cogmtive and 
affecuvc development over lime, 

Other assessment techniques and ways for teachers to record 
atid analyze data resulting from using these techniques are 
described in many sources (e.g. . California \ lathematics Coun- 
cil. 1990: Clarke, Clarke, & Loviit. I'J'Xr. I -sicr Kroll. 
199(), 199I;NCT\\!. 1989. 199 1: Webb &Bnars. 1990). Merc 
extensive guidelines lor assessment ui mathematics class - 
rcx^ms will be appearing S(X)n. For example, liie 1993 ^'ear- 
h<K)k of the N'CTM is tentatively entitled .'\ssessmeni in ihe 
Mathematics Classroom, and NCITvI is presently working 
with an author team on a publication for teachers with the 
\\'oxV\ri^\i\\Q. Mathematics Asses s;neril: Myths, Models, Good 
Questions, and Practical Suggestions. 

Large-scale Mathematics Testing 

fxirge-scale paper-and-pencil testing, typically in a mul- 
tiple-choice format, has Ix-'cn highly visible in Amencan 
education for most of this century The results of these tests 
usmilly have litUe to say to an individual teacher about how to 
improve classroom instjuction. Rather, they serve to order 
siudenLs and to compare ihem to nanonal norms in the aise of 
standardized tesLs. lo give an interprcLatiou ol student j>crlbr- 
mancc in broad mathematical content areas m the case of the 
National Assessment of Educational l^ogress and most state 
(Lssessments. or (o give a country-by-countr>' comparison of 
average mathematics achievement. Most people agree that 
iticse tests serve dieir purposes quite well. 

llic present nationwide furor over educational testing arises 
from many misuses of the tests. For example, they are ol ten 
used to place students into different academic tracks; to judge 
the quality of a curriculum, teachers, schools, and the U. S. 
educauonal system as a whole ; and to "drive" cumculuin and 
instruction (Popham, 1987). In the extreme, standardized tests 
have been used for such high-stakes purposes as determimng 
the funding level for a district or school, the salary of individual 
teachers , and whether indi vidual s tudents wi 11 graduate ( Smi th, 
I99I). 

[>arge- scale achievement tests arc. in theory, measurement 
instruments that unobtrusively and reliably quantify the more 
or less stable achievement of the student, "as if the student who 
ticks off items were inert matter to be assayed and as if all the 
agency and inquiry belongs to those doing the measuring" 
(Wolf, et al., 1991 . p. 46). If in fact as well as in theory such 
tests Were unobtrusive and had no sigmficant effect on stu- 
dents, teachers, curriculum, or instruction, they would 



probably be of litUe cor.ccm to the mathematics education 
profession. However, a growing body of recent research is 
beginning to describe a substantial impact of misuses of large- 
scale tests on cumculmn and instruction, as well as on students 
and teachers themselves. In a re»'ic w of this research. Wolf, et 
al. (1991) summanze tiie effects of this technically elegant 
educational measurement system as follows: 

It distorts insuiicuon, underscores inequities in access lo 
educaUon. and fcjrecloses on students and teachers be- 
coming active participants in signal debates over the 
standards that will be appUed to tlieir work. (p. 32) 

Pri manly as a restUt of such rescai ch findings , there is growing 
recognition among testing experts, c<lucators, and political 
leaders that our large-scale testing practices need lo be re- 
formed (e.g.. National Commission on Testing and Public 
Policy. 1990). 

Joining in this general concern about testing practices, many 
leaders in niatheinaiics iuid mathematics education see the 
widespread md often ill -advised uses of standardized testing 
as a iuajor bamcr to liie success of the present mathematics 
cumculuni iniprovenieiu clT^n (e.g., Kulm. 1990). 'Ilie pri- 
mary reason for tliis concern is that the v!:siorUons of instruc- 
uon alluded lo by Wolf et al. are indirect contlict wiiii the goals 
of the mathemaucs curriculum improvement effort. The 
distoraons include narrowing of instructional focus; fragmen- 
tation of content; emphasis on isolated individual efforts by 
students with no tools beyond pap>er and pencil; use of basic 
skill mabter\' as a gatekeeper to more interesting mathematics; 
and de- emphasis on reasoning, problem solving, and commu- 
nication. 

Mathenoalics for All Students 

A very important goal of the madiematics ref orm effort is to 
provide all students with equal opporiuiuues to Icam math- 
emaucs Ixiyond anthmetic skills. A barrier to that goal is the 
use of test scores as gatekeepers to mathematics courses, 
which, to make matters worse, also results in a disproportion- 
ate exclusion ol' African -A men can and Hispanic students 
from substantial maihemaucs expenences (Oakes. 1990; Wolf, 
et al., 1991). Related to this last concern is iuiother artifact of 
oiu- testing system thaf is troubling to mathematics educators. 
Wolf et al. (1991) describe it as follows: 

In essence, for all the sophistication of our testing system, 
the concern for ranking and classifying has led to the 
acceptance ol a sigmficant proporuon of failure or poor 
peribrmance as "natural." Tlie attenUon to. ..relative in- 
formation overshadows the responsibility to see that all 
studenLs learn and the necessity to provide explicit infor- 
mation about students' current levels of achievement, (p. 
44) 

'llie acceptance in the United States of a sigmficant proportion 
of failure or poor peribrmance is nowhere more evident than 
in mathematics. Many Amencans are not ashamed to admit 
publicly that diey could never do mathematics and to justify 
tliat inability by a lack of "a math gene" that somehow gives a 
myslenous power to a select few —a power that is inescapably 
beyond the grasp of most (NRC, 1989). This tendency to 
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^.'Ci^cpi failure iu mathematics as natural is dramatically illus- 
trated in the scries of cross-cultural studies of mathematics 
achievement in the United States, Japan, andTaiwan (Stevenson, 
Lummis, Lee, & Stigler, 1990; Stigler, Lee, & Stevenson, 
1990). 

Focusing on first-grade and fifth-grade classrooms (40 in 
each country) in similar schools in the three countries, these 
studies included testing mathematics achievement, observing 
classrooms, and measuring students' , teachers*, and mothers' 
attitudes and attributions about mathematics. Overall, the 
superiority of the mathematical knowledge of the Asian stu- 
dents suggested by the Second International Mathematics 
Study (McKnight, et al., 1987) was found to be strong and 
consistent across various kinds of mathematical knowledge 
(Stigler, et al.. 1990). The Asian students not only displayed 
superior skill in computation but were e ven more impressive in 
their perlbrmance on tasks that required an understanding of 
the structure of mathematics. American students, in contrast, 
tended to approach problems that required some imders land- 
ing or reasoning in a routinized manner, typically doing 
without much thought some sort of calculations on all numbers 
in the problem. American students also had much more 
difficulty relating their knowledge of nriathematics to the real 
world. 

Despite this markedly inferior mathematical performance of 
their children, American mothers expressed a higher level of 
satisfaction than did Asian mothers with both their children's 
performance in mathematics and the mathematical education 
they were receiving in school: 

In the educational philosophies of Taiwan and Japan, 
nearly all children are believed to be capable of imder- 
standing the content of the elementary school curriculmn. 
Lack of achievement is attributed to a failure to work hard. 

Americans, in contrast tend to place more emphasis on 
innate ability as a major reason for variations in achieve- 
ment. In a sincere effort to provide experiences that 
children at different levels of abihty can manage, Ameri- 
can children are divided into groups or tracks according to 
their presumed ability levels. 

When adults convey an impression that some children are 
not expected to keep up with others, the children's moti- 
vation is likely (o be diminished. Thus, paradoxically, 
well -meant allowances for different levels of abihty may 
actually run the risk of decreasing the motivation to Icara 
and undermining the achievement of many American 
children. (Stevenson, ^t al., 1990, p. 32) 

This fmal observation by the authors is conoborated by a large 
body of research on ability grouping (e.g., Oakes, 1990). 

From a different perspective, there is an intercsling probabil- 
ity argument against taking very seriously test scores and other 
a pnori criteria as gatekeepers to mathematics courses. The 
argument is a variation on one put forth by Paulos ( 1988) in his 
delightful book, Innumeracy, He pointed out that under certain 
reasonable assumptions only about 20% of people identified as 
having cancer by a 95%- accurate medical test can actually be 
expected to be correctly identified. 



Suppose a test (perhaps in combination with other criteria) 
is being used to place students into, say, general raadiematics 
rather than algebra. No criteria can be perfectly reliable and 
take into account all salient variables (for example, how hard 
tlie student will work during the coming year) that may 
contribute to a student's success or failure in algebra. An 
assumption of some random error, say 80% accuracy, in 
placement criteria leads to a rather disttirbing result. For 
purposes of illustration, suppose that there is a theoredcally 
**conec( placement" for each of 1000 students, namely, the 
correct placement for 900 of the 1000 in algebra. How 
accurately will the students be placed into these courses? 
Eighty of the 100 students who should be in general mathemat- 
ics will be accurately placed on average, but 180 (20%) of the 
900 students who should be in algebra will be incorrectly 
placed in general mathematics. If these criteria are rigidly 
applied, 260 students will be placed in general mathematics, 
and] 80 (nearly 70%) of them should be in algebra. If there are 
any prejudices against low socioeconomic or under-repre- 
sented minority students built into the placement criteria, and 
research suggests tliat often there are such prejudices (Oakes, 
1990), the mistaken exclusion of students in these groups from 
algebra will be even more dramatic. 

In summary, the evidence concerning the negative effects of 
abihty grouping in mathemadcs, especially the exclusion of 
traditionally under- represented minorities, is overwhelming 
(e,g,, Oakes, 1990). The fact that this practice persists in the 
face of all the evidence and the moral arguments against it is 
a draiTiatic illustration of the strong beliefs in this country that 
standardized test scores are a valid measm-ement of a person's 
natural ability or potential for success. 

Testing as a Gatekeeper to Curriculum Reform 

A broader reason for mathematics educators' concern about 
large-scale testing is that, when nationally pubhcized, snident 
peribrmance on these tests can have N'ery serious ramifications 
for the overall direction of the mathematics curriculum. Even 
if it is not designed to measiu*e curriculum outcomes, large- 
scale testing is a very poweri'ul pohtical vchiclcfor curriculum 
reformers and their opponents alike, who recognize the power 
of the press and the public's deeply ingrained ^'bottom line, 
win-or-lose" achievement test view of what constitutes suc- 
cess in an educational program. Low or declining scores on 
such tests can help to slow or virtually stop a curriculum 
improvement movement, as the widely publicized standard- 
ized test score declines in the early 1970s did for the * modem 
mathemadcs"efforts(Nadonal Advisory Conunittee on Math- 
ematical Education, 1975), Poor results on large-scale tests, 
on the otiier hjiiid, may provide some of die impetus for 
reforming the curriculum, as the relatively poor performance 
of American smdents in the Second International Mathematics 
Study has done for the present reform efforts (McKnight, 
Crosswhite, Dossey, Kifer, Swafford, Travers, & Cooney, 
1987; MRC, 1989).' 

A good example of this phenomenon can be seen in some of 
the pubhcity about the recent statewide comparisons of math- 
ematics achievement conducted by the National Assessment 
of Educational Progress (NAEP). In an article in Newsweek, 
June 17, 1991, entided "A Dismal Report Card," Bill Homg. 
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California' s supcnnlcndcnl of schools, used the rclali vcly p<x)r 
results for his state as an opportunity to criticize local schools 
for conanuing torcsist reforms like group problem solving and 
creative thinking. In his words, ''It's like we have a cure for 
polio, but we're not giving the inoculation" (p. (>4-65). Shirley 
I'lill. chairperson of the Mathematical Sciences Education 
Board was quoted in die same article, using the NAHP results 
to argue for reform: *'l'ntil recendy , tlic public was perfecdy 
happy with students who could do the basics of adding and 
subtracting. Now we realize how much more students need to 
know, and people are goi ng to be upset tliat Uiev don' t know u" 
(p. 67). 

On the other hand, these siune NAEP residts were an 
occasion for others to argue against reforms like the NCTNl 
standards, A June 7, 1991. article in The Wall Street Journal 
stated: 

There isn't any proof in this data that the recommended 
NCI'M practices are clearly more eifecd ve than any other 
practices," said /Archie l.^^omie. executive director of 
NAEP. Jolm Saxon, a consen ative cnQc of Qie NCI'M 
standards, was more blunt. I doift hcheve ilie slates that 
did well :ue chasing (NCFNTs) elusive sl:ir. (p. B 1) 

The authors of the same article go on to nole tiuit "students ni 
high-SCO.", m: states reported spending more classrcxMii time 
than others working on problems directly from maih text- 
books, the kind of do-alone, routine activity that ihe inaih- 
leaching group wants dc -emphasized" (p. B 1). 

In rcjility, die NAEP results provide no evidence concerning 
the effectiveness of the NCTM standard:^ and no direct evi- 
dence of the need for any particular educational relorm. al- 
though unacceptably low scores on NAP.P suggest die need to 
change soinediing. The XCTl^M standards appeared in 1989 
iind, when NAI2P was administered, most teachers in die 
countrv' had probably not even heard of them. lel alone used 
them to guide the curricidum or instruction in their classrooms. 
Furdiemiore. NAIIP made no claims to be in alignmenl with 
the stimdards, ;uid even if it had been, the NAF.P results could 
liardly be atuibuted to the madiematics cumcuium. In fact. 
NAEP\s state rankings were very similar to those that have 
resulted year al ter y ear w hen odier achievement scores such as 
SAT scores have been compared. It makes litde sense to now 
attribute those same rankings to slate- by-state differences m 
the level ol implementation of the NCTM standards. 

Pbr madiemalics educators interested in seeing major cur- 
riculum revisions have an impact and gel a fair tnal in Ameri- 
can schools, dus sort of nususe of tests and lest results for 
purposes of judging in the press die success of a new curricu- 
lum is of great concern. Such misguided 'cumcuium evalua- 
tion*' surely conlnbutes to the c> cle of one failed educaUonal 
refonn el fort idler anodier in diis coimlry. One can always 
iirgiie persuasively to the gene rid public with diese "hard 
scientific data" from iiauonally representative samples ol 
students that whatever is now going on in cumculinn and 
msUiiclionis a failure, and. dierefore, a reform is jusufied. But 
jiLst as surely, the bottom-line, large-scale lest score cnlenon 
lor success oi a curriculum dooms the relorm to failure once ii 
becomes viewed as *'ihe present curriculum," because lest 
scores will never be high enough to please everyone. 



Clearly, if education is to break out of diis vicious cycle, our 
win-or-iose, testing mentality must change. Society has a 
moral obligation to care for and educate its cliildren. and, in 
particular, to give tiiem die best mathematics education of 
which we are capable. Scores on appropnaie. large-scale tests 
might help shape our efforts toward that goal, but diey should 
not be given the power to nullify oiu- moral obhgation or to 
lessen the \'igor widi which we pursue the goal. 

Revising Lai^e-Scale Mathematics Assessment 

Some large-scale tests are nio\'ing toward a closer ahgnment 
with the madiematics curriculum envisioned in die recommen- 
dations for reform. Forexample, since Fall 1990 the Scholastic 
.\ptitudc Test is allowing the use of scientific calculators on its 
Madiematics subtest, and a calculator is now a school-district 
option on the Iowa Test of Basic Skills which provides users 
with norms bodi with and widioul calculators. ITie College 
Pnlrance Examination Board has even announced its plans to 
require calculators with graphing capabilities on die AP Cal- 
culus Examination by 1994. In efforts to align more closely 
with die NCTM standards, die Iowa Tcsi of Basic Skills will 
include a subtest on Estimation widi items like those reported 
m Schoen. Blume. and Hoover (1990). iind its l^oblem Solv- 
ing subtest will include a number ol items aimed at measuring 
problem-solving processes, not just final answers, as de- 
scnixid by Schoen and Oehmkc (1980). I'igiu-e 1 shows a 
sample estimation and problem-solving process item. Be- 
cause dies e tests continue to use a multiple -choice format, they 
will not satisfy criucs concerned about the narrowing effects of 
Qiat f ormat on instruction and learning. 



I 4 r2'+ 13 4 5 is between 

a) 18 and 19 

b) 17:md 18 

c) 16 and 17 

d) 19 and 20 

2. ITie school cal etena had 230 kg ol nulk to be shared by 46 
children. The cook wanted to know how many glasses of 
milk each child could have, the cook could solve the 
problem if he also knew: 

a) 'lliere iire 1(K)0 gnuiis in a kilognun. 

b) Each glass holds 2 kg of milk. 

c) The children all like milk. 

d) Each glass is 8 cm high. 



Figure 1. Sample of the types of esumaiion (// 1) and problem- 
solving process {tf2) items to be included on die Iowa Test of 
Basic Skills 
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Tiicre are also many current efforts by researchers and 
incasurcracnt specialists lo broaden the assumptions and goals 
of testing to bring it more in line with various aspects of 
leaching or learning. These efforts include connecting assess- 
ment to cogmuve development and the cychcal nature of 
learning (Romberg, Zarinnia, & Wilhams, 1990), measuring 
students' pcrtment knowledge at the beginning of an instruc- 
tional sequence and supplementing that wi th measures of how 
readily they can understand new skills or procedures just 
beyond their competence level (Campionc, Brown, &Connell, 
1988), and assessment of students' schematic organization of 
knowledge (Marshall, 1988). All of these approaches deserve 
and require more research and development. 

Many pracUDoners in education have despaired of efforts to 
shore up familiar and well -articulated forms of testing. They 
believe that what is needed is an entirely new or reclaimed view 
of quite different modes of assessment similar to some of those 
described earlier in this paper for classroom teachers. These 
modes include observation and analysis of students' work or of 
students* performance on complex tasks or of portfolios of 
their work (e.g. Romberg, et al., 1990; Silver & Kilpatrick, 
1988; Wolf, ci al., 1991). National assessments in New 
Zealand and Great Britain and state assessments in California, 
Connecticut, and Vermont have moved in this direction, and 
many other stales and school distncts are beginning to follow 
suit (Wolf etai., 1991). 

Even the U.S. national assessment system under develop- 
ment by the New Standards Project, imder the direction of 
Lauren Resnick at the University of Pittsburgh's Learning 
Research and Development Center, aims to m?kc such perfor- 
mance-based measuiement its cornerstone (as reported in the 
Report on Education Research, September 18, 1991). In ihe 
same report, Eva Baker, director of the National Center for 
Research on EvaluaUon, Standards andStudentTesting, wamed 
that: 

Our e.xpcnences lead us to believe that valid performance- 
based assessments can be dtveloped. They lake time, 
conceptual models, and careful empirical work. We fear 
that the present policy press for such measures will short- 
cut the process and . . . result in tasks whose validity cannot 
be supported, (p. 5) 
Performance- based assessment has not had the 50 or so years 
iliat the traditional system of educational testing has had to 
develop its sacntific base; therefore, it has less exactness and 
elegance. Thai state of affairs should suggest caution to the 
proponents of large-scale performance-based assessment On 
further consideration , however, one might ask why i lis impor- 
tant that we have a scientifically, elegant educational measure- 
ment system? Is such a system, in itself, an end worth striving 
for, or is it more appropriately ameans to a greater end, namely, 
supporting and improving the American educational system? 

If it is the former, we should probably stick to our U-aditional 
testing system. It is unlikely that performance-based assess- 
ment, portfolios, and the other open-ended attempts to assess 
human performance while mai ntaining much of i ts compl exi ty 
will ever be as reliable as a multiple-choice, standardized test. 
If we see, however, our educational measurement system as a 
means to support and improve the American educational 
system, then we should look beyond the goal of scientifically 



elegant measurement. We should also not forget that, accord- 
ing to the Second International Mathematics Study, which is 
based on a traditional mathematics achievement test, students 
in at least eight or ten coimtries are learning more mathematics 
than our students are (McKlnight, et al., 1987). None of the 
educational systems in these coimtries has a testing system that 
is as efficient and technologically sophisticated as ours. 

Furthermore, America's higher education system has, at 
least until recentiy, been considered to be the most successful 
in the world. In Japan, for example, our public schools are 
scoffed at, but our system of higher education is envied and 
imitated (Taylor, 1983). It may or may not be a coincidence 
that, again imtil recently, higher education has not been domi- 
nated and judged by standardized tests; the argument that its 
mission and goals are too complex to be reduced to multiple- 
choice tests and quantitative comparisons has prevailed. Per- 
haps the argument about the complexity of the higher educa- 
tion enterprise applies equall y to education at all lev el s . At any 
rate, our goal of maintaining a scientifically elegant system of 
educational measurement for elementary and secondary edu- 
cation appears over the years to have been raised to a level well 
out of proportion to its potential for supporting and improving 
our educational svstem. 

The concept of a test-driven or assessment-driven curricu- 
lum seems to be a classic example of giving too much power 
to measurement and too little to curriculimi and instruction. 
No matter how complex or realistic an assessment procedure 
. it seems to have a narrowing effect on teachers and students 
who are to be judged by it Their time and effort begins to go 
toward figuring out what it takes to succeed on the test, and 
they work toward those skills whetiier or not rhe skills are 
appropriate or valuable mathematics. A good e?^ample of tliis 
is the mathematical Tripos in Great Britain around 1500. Well- 
''uown mathematicians like G. H. Hardy, Beruand Russell, 

i J. E. Littiewood took one or two years out of their 
lathematical educations to work with special tutors to learn 
test-taking tricks and problem types in preparation for ihis 
highly competitive and mathematically sophisticated exami- 
nation. Wirmers, including Littiewood, were called wranglers 
and gained a great deal of pubUc adulation. The Tripos was a 
mathematically sophisticated, open-ended examination which 
was considered to be a measurement of higher-order thinking. 
Yet, Hardy (who finished third on the Tripos) later wrote 
bitteriy about the terrible waste for young British mathemati- 
cians who spent years on learning otherwise useless test-taking 
techiuques when tiiey should have been doing matiiernatical 
research (Kanigel, 1991). 

Like the Tripos examination's domination of mathematics, 
the assessment-driven curriculum concept reverses the roles of 
curriculum and assessment. To revitalize the mathematics 
curriculum, it is necessary that assessment be aligned with the 
curriculum. Simply developing and using assessment tech- 
niques or instruments tiiat are aligned with the new curriculum 
goals will not guarantee, however, that the curriculum will 
become a reality. Worse yet, an educational policy which 
assimies that assessment drives curriculum is likely to divert 
resources to assessment that should be going to much more 
important areas of education like classroom teaching, student 
learning, curriculum development, and teacher education. 
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Summary 

Assessment can and should be a major part ofmathemadcs 
classroom instrucdon, but teachers have not usually been well 
prepared in this area. They should learn a variety of assessment 
techniques that include but go beyond paper-and-pencil tesdug 
and combine that widi a soUd pedagogical knowledge of 
mathematics and of the students they teach. Both teacher 
educadon and the way teachers teach in their own classrooms 
must change radically if this ambitious goal is to be reached. 
Such changes will not come easily, but the effort could result 
in more professional, reflecUve mathcmadcs teachers. In turn, 
such teachers are hkcly to foster improved student perfor- 
mance in mathematics. 

Large-scale achievement tests, research suggests, have been 
misused in various ways resulting in a number of negative 
effects on students, teachers, and thecuihculum. In an attempt 
to circumvent some of these negative effects and to better align 
assessment with current curriculum goals, new or reclaimed 
forms of lai'ge- scale performance-based assessment are now 
being tised in various districts, states, and coimtries including 
the New Standards Project in the United States. 

The jury is out on the scientific characteristics of these 
approaches to assessment and on the effects they may have on 
teachers and students. Whatever the verdict, however, math- 
ematics is in a state of fltix. To be well educated mathemati 
cally does not mean the same thing as i t did 50 years ago or even 
ten years ago. The rapid changes occuiring everywhere in 
society, especially in technological developments, suggest a 
need for a thorough and continuous rethinking of oiu* definition 
of madiematics achievement as operationalizcd by oiu* tradi- 
tional tests. Leaders in die mathematics and mathematics 
education professions believe they should play an important 
role in helping to shape the mathematical content of, and the 
nature of mathematics that is implicit in, large-scale assess- 
ment techniques . 
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